Iterative Construction of Hierarchical Classifiers for Phishing Website Detection

نویسندگان

  • Jemal H. Abawajy
  • Gleb Beliakov
  • Andrei V. Kelarev
  • Morshed U. Chowdhury
چکیده

This article is devoted to a new iterative construction of hierarchical classifiers in SimpleCLI for the detection of phishing websites. Our new construction of hierarchical systems creates ensembles of ensembles in SimpleCLI by iteratively linking a top-level ensemble to another middle-level ensemble instead of a base classifier so that the top-level ensemble can generate a large multilevel system. This new construction makes it easy to set up and run such large systems in SimpleCLI. The present article concentrates on the investigation of performance of the iterative construction of such classifiers for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of the iterative construction of hierarchical classifiers. The results presented here demonstrate that the iterative construction of hierarchical classifiers performed better than the base classifiers and standard ensembles. This example of application to the classification of phishing websites shows that the new iterative construction combining diverse ensemble techniques into the iterative construction of hierarchical classifiers can be applied to increase the performance in situations where data can be processed on a large computer. Keywords-phishing websites, ensemble classifiers, hierarchical multi-level classifiers, Random Forest

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phishing website detection using weighted feature line embedding

The aim of phishing is tracing the users' s private information without their permission by designing a new website which mimics the trusted website. The specialists of information technology do not agree on a unique definition for the discriminative features that characterizes the phishing websites. Therefore, the number of reliable training samples in phishing detection problems is limited. M...

متن کامل

Phishing Website Detection based on Supervised Machine Learning with Wrapper Features Selection

The problem of Web phishing attacks has grown considerably in recent years and phishing is considered as one of the most dangerous Web crimes, which may cause tremendous and negative effects on online business. In a Web phishing attack, the phisher creates a forged or phishing website to deceive Web users in order to obtain their sensitive financial and personal information. Several conventiona...

متن کامل

An Associative Classification Data Mining Approach for Detecting Phishing Websites

Phishing websites are fake websites that are created by dishonest people to mimic webpages of real websites. Victims of phishing attacks may expose their financial sensitive information to the attacker whom might use this information for financial and criminal activities. Various approaches have been proposed to detect phishing websites, among which, approaches that utilize data mining techniqu...

متن کامل

Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification

Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...

متن کامل

Associative Classification Mining for Website Phishing Classification

-Website phishing is one of the crucial research topics for the internet community due to the massive number of online daily transactions. The process of predicting the phishing activity for a website is a typical classification problem in data mining where different website’s features such as URL length, prefix and suffix, IP address, etc., are used to discover concealed correlations (knowledg...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JNW

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014